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Abstract. Bernoulli sieve is a recursive construction of a random composition (ordered 
partition) of integer n. This composition can be induced by sampling from a random discrete 
distribution which has frequencies equal to the sizes of component intervals of a stick-breaking 
interval partition of [0, 1]. We exploit Markov property of the composition and its renewal 
representation to derive asymptotics of the moments and to prove a central limit theorem for 
the number of parts. 

1. The Bernoulli sieve can be seen as a generalisation of the 'game' found in |3j. The first 
round of the game starts with n players and amounts to tossing a coin with probability X\ 
for tails. Each of the players tosses one time and the players flipping tails must drop out. If 
all n get heads the trial is disqualified and must be repeated completely with all n players, as 
many times as necessary until some players do quit. If at least one player remains after the 
first round, the second round continues with the remaining players, who must toss another coin 
with probability X 2 for tails. The game lasts with probabilities Xs,Xi, . . . for tails until all 
players are sorted out. It is assumed that the probabilities Xi, X 2 , . . . are independent random 
variables with a given distribution uj on ]0, 1[, and that given Xj the individual outcomes at 
round j are conditionally independent. It follows readily that, as far as only the number of 
players is concerned, the outcome of a round depends on the past solely through the number 
of players which proceed that far. 

A random composition C n of integer n arises, with part j being the number of players 
dropping out at round j. In this paper we shall focus on some properties of C n , in particular 
we are interested in the distribution of the number of parts of the composition, which may be 
thought of as the duration of the game. 

There is a natural way to settle all C ra 's on the same probability space in a consistent 
fashion. Consider a random interval partition of [0, 1] by points 1 — (1— Xi)(l— X2) . . . (1 — Xj), 
j = 1, 2, . . . and assign each player a random uniform tag, independent of the Aj's. The tags 
group within the intervals, and recording the cluster sizes, from the left to the right, yields 
a composition (intervals containing no tags are ignored). To establish equivalence with the 
coin-tossing construction we only need to note that the chance for a particular player to remain 
for at least j rounds in the game is precisely (1 — -Xi)(l — X 2 ) ... (1 — Xj). 

The game in [3] corresponds to u supported by a single point, in which case the Xj's are 
all equal. The composition is induced then by sampling from a geometric distribution and, of 
course, had appeared many times in the literature under different guises. Karlin distinguished 
this case in the context of a general occupation problem with infinitely many boxes and derived 
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distributions for the number of parts, number of singletons, doubletons etc. The feature studied 
in j^j, was the probability that there is exactly one winner - a player remaining at the last 
round (that is to say, the last part of composition is 1). 

When uj is a Beta(l,#) distribution the law of C n is known as the ordered Ewens sampling 
formula (ESF). This structure is well understood, see |T] for a recent account and (THj, [IDj for 
generalisations. 

Our interest in the construction arose in connection with the regenerative compositions j^j. 
Within this more general setting the Bernoulli sieve composition may be seen as discretisation 
of a subordinator with finite Levy measure and zero drift. In what follows we shall treat 
general measures, with the only constraint that u is not supported by a geometric sequence like 
(1 — x J ) (in particular, sampling from the geometric distribution is ruled out) and such that 
uj does not settle too much mass near the endpoints of [0, 1]. Our method relies on renewal 
theory and the analysis of 'divide-and-conquer' recurrences, the techniques intended to replace 
the independence-based tools available in the ESF case, see [T]. 

2. By exhangeability among the players the compositions C n are sampling consistent for 
different values of n. That is to say, if a part of C n is selected at random, in a size-biased fashion, 
and decremented by one unit then the resulting composition of n — 1 (possibly with fewer parts) 
has the same distribution as C n _i. The sequence (C n ) forms a composition structure in the 
sense of |Z],jH] and determine a random exchangeable composition of a countable set. 

There are two further constructions of C n featuring renewal and Markov properties. 

The renewal representation is obtained from the stick-breaking construction by applying 
transformation 4>(x) = — log(l — x) which maps [0,1] onto [0, oo]. Consider the range 1Z of 
a renewal process with initial state and step distribution Q = u^, and let Ei,...,E n be 
increasing order statistics from the standard exponential distribution (which correspond to 
exponentially distributed tags). The points of TZ induce a partition of [0, oo] into intervals 
making up the compliment 1Z C = [0, oo] \ TZ, and the points Ej group within the intervals; in 
these terms composition C n becomes a record of all nonzero cluster sizes, from the left to the 
right. 

Markov chain representation of C n stems from the following first-part deletion property of 
C n . Given the first part of C n is m, the composition of n — m obtained by removing this part 
has the same distribution as C n _ m . This property is obvious in the renewal context: it follows 
from the regenerative property of TZ (applied at the leftmost point of 7Z to the right of E{) taken 
together with the memoryless property of the exponential distribution. The deletion property 
implies that the parts of C n can be viewed as decrements of a decreasing Markov chain Q n 
which has state-space {0, 1, . . . ,n}, starts at state n and eventually gets absorbed at 0. The 
one-step transition probability from n to n — m is 




m = 1, . . . , n. 



(1) 



where 




m = 0, 1, . . . , n 
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and the binomial moments of uo. Similar expression can be given in terms of f2, with 1 — e~ z 
in place of x. The quantity 1 — w(n, 0) = w(n, 1) + . . . + w(n, n) will appear throughout as a 
normalising factor, we use therefore the shorter notation W(n) = 1 — w(n, 0). In other terms 



is the characteristic exponent of the measure f2 thought of as a Levy measure associated with 



For a given composition (ni, . . . , n k ) of n the probability that C n assumes this value is of 
the product form 

p(ni, ...,n k ) = q(m + . . . + n k , n 2 + . . . + n k )q(n 2 + ■ . . + n k , n 3 + . . . + n k ) ■ ■ ■ q(n k , n k ), (2) 
because this is the probability that the chain Q n has decrements ni,...,n k before absorption 



3. We will be interested in the first instance in the number of parts K n of composition C n . 
It follows from uj{1} = that q(n, m) > for all n > m > 1 and K n goes to infinity with n. 

Observe that the sizes of intervals comprising the partition of [0, 1] are Yj = (l — Xi) . . . (1 — 
Xj_i)Xj. Rephrasing the stick-breaking interpretation, K n is the number of boxes occupied by 
at least one of n balls, with probability Yj of hitting the jth box. Karlin's paper ^2] is a 
basic reference on the model with infinitely many boxes and nonrandom frequencies, and some 
information on K n can be extracted from Karlin's results by conditioning on (Yj). 

Consider two conditions on u which limit concentration of mass near 1 and 




(1 




in 0. 




(3) 



(4) 



Reformulated, the condition Q says that the first moment of Q is finite: 




(5) 




Proof. By the strong law of large numbers we have for j ; — > oo 
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(condition (@J) is necessary and sufficient to have the second term negligible). From this relation 
we have (for Karlin's alpha on p. 376 of [T2*] ) 

#{j : Yj > 1/x} ~ — logx , as x — > oo 

almost surely. By ^2], Theorem 1', 

E ( K n\(Yj)) ~ - logn, 

and by Theorem 8 from that paper the statement holds conditionally on (Yj), hence also 
unconditionally. □ 

Note that 'deconditioning' itself does not allow to conclude about the asymptotics for KK n 
(see Proposition 2 to follow). Results of 12J could be used further to derive the asymptotics 
of the conditional variance of K n and to obtain a conditional central limit theorem. We will 
not dwell on converting these results into their unconditional counterparts, rather will take an 
approach based on the renewal features of our model. 

4. Let F n be the first part of C n , with distribution ¥(F n = m) = q(n, m). Markov property 
of the composition implies that K n satisfies a distributional equation 

K n L 1 + K' n _ Fn (6) 

where F n ,K[, K' 2 , . . . are independent and each K'- has same distribution as Kj. Averaging in 
(JHJ) we see that a n = E K n satisfies a linear recursion 

n 

a n = 1 + ^2q(n,m)a n _ m (7) 

m=l 

with boundary value a® = 0. 

Remark. Recursions akin to (J7J) are common in the average-case analysis of algorithms, see 
references in ^H]- A recent dissertation by Bruhn |2j is devoted solely to them. Some results 
of Bruhn are reproduced in Rosier [20] along with distributional analysis of equations more 
general than @. The class of recursions treated in the cited work relates to the assumption 
that the weights q(n,-), considered as measures with support {1/n, 2/n, . . . , n/n}, satisfy an 
equiboundedness condition and converge weakly to some measure on [0, 1]. 

In our case the convergence of q(n, ■) to uj is clear from the convergence of moments (which 
amounts to Bernstein's trick used to prove the Weierstrass uniform approximation theorem). 
Above that, the Bruhn- Rosier conditions certainly hold when uj has a smooth density. However, 
we were unable to check their (very technical) conditions for the general measures u and will 
rely on the special structure (JTJ. A specific feature of the class of recursions studied here is that 
we have a canonical renewal process as a part of the model, while Bruhn and Rosier needed to 
construct an auxiliary renewal process to 'mimic' the recursion. 
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Remark. We formulate the next fact as -^ 1 ([0, 1] , cj)-approximability of the logarithm by 
the (ordinary) Bernstein polynomials, but in the sequel we will also make use of the formula (JHJ). 
There is a variety of closely related results in the literature: the best known is the aforemen- 
tioned argument due to Bernstein, then there is a number of L 1 -results on generalised Bernstein 
polynomials [T5] . and pointwise asymptotic expansions found in [U] and [TTj . Still, summation 
formulas (jSJ), (|18|) seem to be new. 

The Bernstein polynomial of degree n for log(l — x) is 

m=l ^ ' 

Lemma 1 If uj satisfies (J3J) then 

lim / |B n (x) — log(l — x)\ u(dx) = . 

n->oo J Q 

Proof. There is no simple formula for the expectation of the logarithm of binomial random 
variable but replacing the logarithms by the harmonic numbers, as log(l — m/n) = h n _ m — 
h n + o((n — m) _1 ) , we have the explicit summation formula: 

n—l / \ 2 n 

V ( n ) x m (l - x) n - m {h n ^ m - h n ) - x n h n = -~ - x - - . . . - - . (8) 

By monotone convergence the series in the RHS approaches log(l— x) in the sense of I/ 1 (o;, [0, 1]) 
whatever to. Getting back to B n easily yields the claim. □ 

Proposition 1 strongly suggests the logarithmic asymptotics for a n . Our proof of this fact 
will rely on the following simple observation. Given n > 1 suppose (a n ) satisfies (J7|) for n > n , 
then (a n + c) also satisfies the recursion for n > n , whatever constant c. 

Proposition 2 Ifuj meets (jHJ) then any sequence (a n ) satisfying (JZj) for n > hq > 1 has 

asymptotics 

log n 

a n ~ . 

H 

In particular, this holds for the sequence a n = KK n which is the unique solution which satisfies 
(|7J) for n > and has the boundary value ao = 0. 

Proof. Assume that there exists e > such that a n > (1 + e)/i -1 logn for infinitely many values 
of n. We will lead this to contradiction. Selecting e smaller, for any fixed c we could have 
inequality a n > (1 + e)yU _1 logn + c for infinitely many values of n. Let n(c) be the minimum 
such n, then n(c) — > oo as c — > oo. Thus for n < n(c) we have a n < (1 + e)/^ 1 logn + c which 
implies 

n(c) . n(c) 

1 + ^( ra ( c )' m ) a n(c)-m < 1 + c H ^ ?( ra ( c )' m ) log(n(c) - m). 

m=l ' m=l 
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Now from (J7J) and the definition of n(c) we derive 

_ .logn(c) 1 (1 + e) ^-l . . . . , . . . , . . 

(1 + e) — — — + c < 1 + c + q(n(c),m) log(n(c) - m) (9) 

^ ^ m=l 

where c itself cancels but n(c) can be taken arbitrarily large by the choice of c. 
From Lemma 1 we see that 

n-1 

g(n, m) \og(n — m) = log n — \i + o(l) 

m=l 

and substituting this formula into © and letting c —>■ oo yields < — e , which is the promised 
contradiction. Thus the assumption was wrong and because e was arbitrary we have 

lim sup — — < 1 . 

fj, 1 log n 

A symmetric argument proves the analogous lower bound, and the claim follows. □ 
Turning to the variance of the number of parts v n = Var K n we derive from © a recursion 

tn \ n 

2a n - 1 - al + ^q(n,m)a 2 n _ m ) +^2q(n,m)v n _ m , v = (10) 
m=l / m=l 

which involves a n = KK n . Both (J7J) and (fTUj) are instances of the general equation 

n 

K. = r 'n + ^2 q ( H > m ) h n-m, b = 0, (11) 

m=l 

where (b n ) are unknowns, and (r n ) is given. The proof of Proposition 2 is easily extended to 
obtain 

Corollary 1 Assume Q. For any n and r ^ 0, i/ (b n ) satisfies (|11|) for n > n and if 

r n — > r then b n ~ rji^ 1 logn as n — > oo. 

With a logarithmic asymptotics for v n in mind, we aim to show the convergence of the 
bracketed inhomogeneous term in (fTUj) . It is easily seen that for this purpose we need more 
than just the principal-term asymptotics of the expectation, and it is exactly the point where 
the renewal theory provides indispensable tools. 

5. It is well known that a renewal process starting at admits a delayed version which has 
the expected number of renewals within [0, z] (the potential measure) growing linearly with z, 
see 0. It turns out that the stationary renewal process induces a 'stationary' version of the 
Markov chain Q n , which can be used for the asymptotic analysis of (JHJ)- 

Let g{n, rn) be the probability that Q n ever visits state m (which means that at some round 
of the game there are exactly m players left). Since Q n can visit each nonabsorbing state 
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at most once g(n,m) is also the potential function, i.e. the expected number of visits to m. 
Interpreting Tyyi ELS £1 'reward' collected at visit to state m, we can think of h n satisfying as 
the total expected reward of Q n . The interpretation implies 



71-1 



b n = ^2g(n,m)r m (12) 



m=l 



and reduces solving (j2J) to computation of the potential function. Explicit formula is com- 
plicated as it involves summation of products over a constrained set of compositions of n. 
Fortunately, there is a simple asymptotic formula. 

Suppose Q is not supported by a lattice, and has finite first moment For u this means 
(J3J) and that the support is not a geometric sequence like 1 — (in particular, the case of 
geometric frequencies, when u> is supported by a single point, is excluded). Switching to the 
renewal representation, we introduce a probability distribution 



fi M = - f fi[C,oo]dC. 
Jo 



Let the overshoot B(z) be the distance from z to the leftmost point of 1Z to the right of z 
{B(z) is sometimes called the forward process, or forward recurrence time, or residual lifetime 
etc.). The renewal theory, as presented in vol. 2 of the Feller's textbook, says that VLq is the 
limiting distribution of the overshoot as z —>■ oo. Observe that Q n visits m when there is a 
point of 1Z between _E n _ m _i and E n _ m or, equivalently, when the overshoot at E n _ m _i does not 
exceed E n _ m — E n - m -i. The spacing between the two order statistics is independent of _E n _ m _i 
and its distribution is Exponential (m). By the renewal theorem the distribution of 5(£ l n _ m _i) 
converges to Q as n — > oo because E n _ m _i — > oo (in probability), thus 

/•OO -I poo 

g(n, m) = P( J B(E n _ m _ 1 ) < E n _ m - E n . m ^) -> / e~ m * fi (dz) = / (1 - e"" 12 ) n(dz), 

where the last step follows via integrating by parts. Changing measure back to uj we obtain 

Proposition 3 If uj is not supported by a geometric sequence and satisfies (J3J) then for 
any m 

hm g(n, m) = . 

The proposition suggests to modify chain Q n so that the potential function becomes exactly 

g (m) := , m = 1, . . . ,n - 1. 

/i m 

We shall do this by assuming a special distribution for the first transition (which can be thought 
of as a qualifying round before the game). 

Remark. Another possibility were to introduce a proper initial distribution on {0,1,..., n} 
so that the formula for potential function were valid also for m = n. But this would correspond 
to composition of a random integer, a model we wish to avoid. 
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Renewal theory offers construction of a stationary version of 1Z. Take Z independent of 1Z 
and with distribution Q . The shifted set TZq = Z + 1Z is the range of the stationary (delayed) 
renewal process. For any z > the overshoot distribution for TZq at z coincides with Qq. 

The points of 7Z induce an interval partition of [0, oo] thus also a partition of the sequence 
of order statistics Ei, . . . , E n . Recording the sizes of blocks we obtain a stationary composition 
C 0n of n. The parts of C „ are considered as decrements of a new Markov chain Q 0n . Repeating 
the argument which lead us to Proposition 3 we derive from invariance of the distribution of 
B(z) that go is the potential function of Qo n . 

For any reward function the solution of (fTTj) satisfies 



n— 1 n— 1 



m=l m=l 



go(m)r m = ) q (n,m)b (13) 



where qo(n, •) is the distribution of the first part of Co n - This formula follows by computing 
the total expected reward of Qo n upon departure from state n. Including state n leads to 



r n + ^2go(m)r m = ^2w (n,m)a n _ m , (14) 

m=l m=0 

where wo(n, •) is the distribution of the number of EjS to the left of Z . Explicitly, 

w (n,m) = ( n ) r(l-e-Te~ (n ~ m)z tto(dz) 



o 



and 



qo{n, m) = ( ) w Q (n, m) + w (n, 0)q(n, m). 



And when expressed via binomial moments of uj this becomes 

fi \m J \ ^ n — k n 



Remarks. The relation between compositions Co n and C n is that they are identically 
distributed given the size of the first part. The distribution of C n is of the form (j2J with the 
first factor replaced by q (n,rii). 

The distributional identity (C n ) = (Co n ) holds iff Q = Qq, in which case Q is an expo- 
nential distribution, 1Z is a homogeneous Poisson point process and therefore C n is governed 
by the ordered ESF. This explains, to an extent, the role of ESF as a 'central limit' because 
superposition of many rare renewal processes approaches the Poisson process. 

For suitable choice of Q the sums in the RHS of (fT3*j) or ()14J) become Cesaro or Euler 
averages. The LHS is easy to analyse but concluding directly from these relations about the 
behaviour of (a n ) is only possible when (a n ) is known to satify certain regularity conditions 
(the Tauberian conditions). The direct approach seems hard to realise because the regularity 
conditions are very sensitive to the summability method. 
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For r n = 1 the LHS of (|14|) is the expected number of parts of the stationary composition, 
which is equal to 

1 , W(m) 
' m a 

m=l 

and, quite expectedly, is asymptotic to /i _1 logn. 

Example. For ESF(l) we have W(n) = 1 — (n + and // = 1 whence the expected 
number of parts is the harmonic number /i n = 1 + (n — + . . . + 3" 1 + 2 _1 as is well-known 

111 

We have seen that g(n,m) — > go(n,m) for n — > oo and wish to obtain the asymptotics of 
(I12|) by substituting go instead of g. To this end we need a stronger assumption on u 

v := / (log(l — x)) 2 uj(dx) < oo 

which in terms of fl means finiteness of the second moment, v = J °° x 2 Q(dx). 

Proposition 4 Suppose uo is not supported by a geometric sequence and also v < oo. 
Suppose (r n ) is such that \r n \ < r' n where (r' n ) is a decreasing sequence satisfying J2r' n /n < oo. 
Then for (b n ) solving (fT2^) we have 

lim &n = lf ^ 

n=l 

Proof. Given integer J suppose (r n ) is such that r n = for n < J, is decreasing for n > J and 
satisfies Yl r n/ n < oo. We wish to show that for the sequence (b n ) solving (JT2|l with such (r n ) 
there is a bound 

lim sup b n < — \^ — H — ^— . (15) 

n=l 

To this end we will make use of the renewal representation. 

Recall that Q n collects reward r m if the chain visits state m. This occurs when 1Z has 
at least one point between E n _ m and E n _ m+ i, in which case let us assign reward r m to the 
rightmost such point (equivalently, given E n _ m is the leftmost point in a cluster, the point of 
1Z in question is the left endpoint of the component intervale [0, oo] \ 1Z containing E n _ m ). let 
U be the potential measure of 71, so that U[0, z] is the expected cardinality of 1Z fl [0, z]. The 
total expected reward of Q n may be written as 

POO 

r n W{n)+ / $ n (z)U(dz) 
Jo 

where the first term stands for the reward at G 7Z, which is only due in the event that the 
first jump of the renewal process exceeds E±, and the integrand is 

®n{z) = ( n )e- zm (l - e- z ) n ' m r m W(m). (16) 

m=l ^ ' 
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In the same manner, we associate the rewards collected by the stationary chain Q 0n with 
the separating points of TZq and with (exceptional point, not in TZq). The expected reward 
becomes 

r w(ni + r$ n{z)Uo{dz) 

where the first term stands for the event that E\ < Z (i.e., E 1 falls to the left of Hq). This is, 
of course, yet another expression for the LHS of (JHJ). 

We modify now the reward processes for chains Q n and Qo n by deleting the first term 
(reward at 0) and by replacing $ n with another function $ n defined via (j!6|) but with factors 
W(m) deleted. Deleting the first term has no asymptotic effect because r n goes to as n — > oo. 
We also have $ n (z) < $ n (z)> thus $ n (z) corresponds to a more generous reward structure, 
with reward at z being r m if E n _ m < z < E n ^ m+ i (thus there is no other contraint on z except 
that z G TZ, respectively z G TZq). The modified reward associated with TZo is the sum in the 
LHS of (HSJ). 

The function $ n (z) is unimodal, with the unique maximum attained at z* , which is the 
unique positive solution of equation 



-rj 



"; 1 ) + £ (r '"- r '" +l) ("™ 1 ) (tB^T J = ° 

' m=J ' ' 



(the uniqueness follows from the monotonicity of (rj),j > J). For n — > oo Poisson approxi- 
mation provides asymptotics ne~ z * — > ( where ( is the unique positive root of transcendental 
equation 

- 7 y + 2^( r -- r -+i)^r = °- 

m=J 

In the following argument it is only important that z* —>■ oo as n —>■ oo. 

Because IZq is 1Z shifted to the right, TZq = Z + 1Z, there is a one-to-one correspondence 
between the sets TZ fl ] (z* — Z ) + , z*) and TZq D ]0, z*] . Furthermore, because $ n is increasing on 
[0, z*\ the total (modified) reward of TZq over ]0, z*) is larger than that of TZ on ]0, (z* — Z ) + ]. 
On the other hand, the expected reward of TZ on ](z* — Z ) + ,z*] has an asymptotic bound 
rju/(2fi 2 ); indeed rj = maxr^ is an upper bound for the instantaneous reward and the potential 
U](z* — Z ) + , z*] is asymptotic to 



Z n 1 f 00 „ r , . 1 



E— = — / zSl\z,oo\dz = — / z 2 Q(dz) 

as it follows from the two-term expansion in the renewal theorem, in the case v < oo (see jHj, 
Section 4, Chapter XI). 

To the right of z* the relation is reversed, since the function $ n is decreasing. Shifting the 
origin to the leftmost point of TZq fl [z* , oo] enables to view TZ on the new scale as the range of a 
delayed renewal sequence. Thus the expected reward of TZo on [z*, oo] is larger than that of TZ, 
up to a term estimated by rjv/(2[i 2 ), exactly as above. Putting the two parts together shows 
that the expected modified reward is bounded by the RHS of (JT3J). The unmodified reward is 
smaller, hence (JT3J). 
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Now suppose (r n ) is decreasing and satisfies J2 r n/ n < oo. We split the sequence at J and 
decompose it in two: r n = r n l{ n< jy + r n l{ n >jy. Since recursion (fT2^) is linear, the decomposition 
forces the representation of solution as, say b n = b' n + b" n . Applying the renewal theorem we 
get b' n — > [T x J2n=i r n W(n)/n. As for the second part, lim sup b" n is estimated with the help of 
(115)) and approaches zero when J — > oo, because both tj and the tail-sum of the series vanish. 

For arbitrary sequence satisfying the condition of proposition splitting at J yields one part 
converging to fi^ 1 Yln=i r n W(n)/n and another part estimated by a solution with reward se- 
quence decreasing for n > J, thus going to as J grows. □ 

Now we are in a position to improve on the asymptotics of a n = ~EK n . It will not be 
supererogatory to remind that asymptotic expansion of the harmonic number starts with h n = 
log n + 7 + 0(n _1 ). 

Proposition 5 Suppose uo is not supported by a geometric sequence, satisfies (jlj) and 
v < oo. Then 

logn 7 
a n = 1 h b + o(l) 

where 7 is the Euler constant and 



b = - / logX Uj(dx) + -^r. 

H Jo 2 V 



Proof. Writing a n = // 1 h n + b n , substituting this into Q and using the summation formula 
(JHJ) we find that (b n ) satisfies (|11|1 with 



1 /• 2 



+ — + ... + — cu(dx) 



//W(n)y V 1 2 n 

which can also be written as 



r n W(n) = - (1 - x ) n oo(dx) + - _-++... ) u (dx). 

H J \n + 1 n + 2 







Using monotone convergence and manipulating the series we find 

V rnW<yTl) = - [ 1 \ogxco(dx) + 7 ^ [\log(l-x)) 2 u(dx)=b. 
n=1 n t i t 1 Jo J 

Since Win) —>■ 1 and r n W(n) is the difference of two terms which decrease in n, application of 
Proposition 4 yields b n — > 6. □ 

6. With no additional assumptions we will derive asymptotics of the variance v n = V&r K n . 
The key issue is the asymptotic evaluation of the inhomogeneous term of the recursion. 

Lemma 2 Under assumptions of Proposition 5 the expectation a n = KK n satisfies 
lim \2a n - 1 - a 2 n + V q(n, m)a 2 n _ m = — - 1 . 
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Proof. For b n , r n bearing the same meaning as in Proposition 5, we have 

n 

b n -> b, W(n) -> 1, b n - ^2 <?( n ; m ) b n-m = r n , 

and integrating by parts yields 

r n W{n) = -n f ' u [0, x] (1 - x^dx + - f - ^ 1] X " dx . 

Jo ^ Jo 1 ~~ x 

Further useful estimates follow from the n — > oo asymptotics 

1 ' 1 \ , / i 



x n u(dx) = o[- , (l-x) n u(dx)=o[- , r n = oi- . (17) 

\lognJ J VognJ VognJ 

To justify the first relation, observe that integrability and monotonicity of log (1 — x) imply that 
uj [x, 1] — o(\ log(l — a;)! -1 ) for x f 1 (in fact, the relation is equivalent to the integrability). 
Integrating by parts and using monotonicity we have 

/ x n uj(dx) = / nx n ~ 1 uo[x, 1] dx < const ■ / nx n ~ l \ logx| _1 dx 
Jo Jo Jo 

and by a Tauberian argument this is o (| logn| _1 ). The second relation follows in the same way 
from 

/ (1 -x) n u{dx) =n [ (1 -x) n - 1 u[0,x]dx < const- f n{l - xf-^logil - x)^ 1 dx 
Jo Jo Jo 

and uj [0, x] — o (| log x]' 1 ) for x I 0. And the third relation follows from the first two. 
Substituting a n = ii~ x h n + b n and grouping terms we have 

n 

2a n -l-a 2 n + Y, q{n,m)a 2 n „ m = T x + T 2 + T 3 - 1 



m=l 

with three to-be-evaluated terms 



T i = -b 2 n + J2^,m)b\ 

2 



n—m 





m=l 


2b n 


- 2b n — 




H 


2h n 


K, 


n 


/i 2 



m=l 

9.h. h 2 1 « 
// n' z ii z 

m=l 

From b n — > b it is obvious that 7\ — > as n — > oo. To see that also T 2 vanishes write 

bn—mhn—m ^n—mhn ~l~ ^n—m ^)(^n— m ^n) ~l~ ^(^n— m ^n) 
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then from (jHJ) and (|17|) we obtain 

n 1 1 C 1 ( n x 3 \ 

^ q(n,m)(h n _ m - h n ) = / h n x n - — udx . 

t^x W ^ J* V U 3 ) 

hence by f!17|) and Lemma 1 

n 

b ^ ?( n > m ) (^n-m - K) -> 
m=l 

2J g(n, m) (6„_ m - 6) (/i n _ m - h n ) -> 

m=l 

n 

^ g(n, m)b n - m h n = (b n - r n )h n = b n h n + o(l) 



m=l 



which indeed implies T 2 — > 0. 

To evaluate T3 we need a summation formula similar to (jHJ), but this time we should take a 
combinatorial analogue of log 2 in place of log. To this end, introduce 



5 ™ = Yl — 

l<i<j<n J 



then there is a summation formula 

n— 1 x \ n j 

J2 ( " ) ^ m (l - x)"- m s n _ m = s„ - ]T -(/l n - , 



m=0 x x j=l 

where we recognise partial sum of the Taylor series 



1 00 i 

-(log(l-x)) 2 = ^-Vi- 

It follows that 

n-i 1 f 1 f n x 3 \ 

y]q(n,m)s n - m = s n - / [y^—(h n -h j ^ 1 ))u(dx) 



and because h n differs from 2s n by the partial sum of a converging series, 

n 

h 2 =2s - V — 

we conclude that 

2 f 1 f n x 3 \ 
^q{n,m)h 2 n _ m = h 2 n -^-jJ^ I ^ —(/^ - fy^) 1 + (1) 



13 



where we exploited monotone convergence and (|17|) . Now it is easily seen that T 3 — > vj 
Putting the terms together we arrive at T\ + T2 + T3 — 1 — > 1 — v/fj 2 . □ 

Remark. The summation formula (fTHjl implies an analogue of Lemma 1: for arbitrary 
normalised weight uj the square of logarithm is L 2 ([0, 1] , w)-approximable by its Bernstein poly- 
nomial. 

Appealing to Corollary 1 we obtain the desired asymptotics of variance. Define a 2 = v — fi 2 , 
that is, a 2 = ft 0oo -i(z — (i) 2 fl(dz) is the variance of distribution Q. 

Proposition 6 Under assumptions of Proposition 5 



7. We turn next to the central limit theorem for K n . Neininger and Ruschendorf [?] 
derived a general CLT for solutions of equations like (jUJ). In our context, the assumptions of 
their Theorem 2.1 are easily checked, with the only exception that their CLT requires some 
expansion Vari^ n = logn + 0((logn) 1_e ), which is not guaranteed by the integrability of 
(log(l — x)) 2 rather relies on integrability of a higher power of the logarithm. We shall see 
that in our situation no additional assumptions are necessary and the CLT follows by a simple 
comparison with the number of renewals. 

Given n, define a cell to be a component interval of [0, 00] \ 1Z containing at least one Ej, 
j < n. Clearly, the total number of cells is K n . Let L n be the number of cells which have the 
left endpoint smaller logn, and let R n be the number of renewals on [0, logn] (including 0), that 
is R n = #(7Z n [0,logn]). It is an easy matter to see that L n < R n and L n < K„. Moreover, 
since the expected number of order statistics that exceed logn is 1, we have E [K n — L n ) < 1. 

Proposition 7 Under assumptions of Proposition 5, 



converges weakly to the standard normal random variable. 

Proof. By |2j (Section 5, Ch. XI), R n is asymptotically normal with expectation /i -1 logn and 
variance a 2 /! -3 logn. Furthermore, E,R n = /i -1 logn + z/(2 / u 2 )^ 1 + o(l) (0, Equation (4.5)). 
By asymptotics of moments (Propositions 5 and 6) and the above inequalities, the L 1 -distance 
between any two of the three random variables (K n — a^Vn 1 ^ 2 , (Ln—a^Vn 1 ^ 2 and (R n — a^Vn 1 ^ 2 
goes to zero. It follows that L n and K n are also asymptotically normal. □ 

In fact, the renewal theorem taken together with a Poisson limit for the number of E^s 
exceeding logn implies weak convergence of K n — L n . Asymptotics of the expectation involves 
the exponential integral function 



Var K n ~ — log n . 
fi 3 



K n - /1 1 log n 
crfi^ 3 / 2 logn 



(19) 




e y y dy . 
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Proposition 8 We have 

7 1 f 1 f 1 
lim E(K n — L n ) = — I — / logxcj(dx) + / I(x) ou( dx) . 

n ^°° " " Jo Jo 

Proof. Recalling (fTKJ) . using Poisson approximation and the renewal theorem, and changing the 
variable of integration for £ = ne~ z we compute 

E(K n -L n ) = - *„(*) dz + o(l) = f e"< £ ^EM^ + (i) = 

"./logn Jo ^ ™ K 

1 ^ 1 — e _2X 7 1 f 1 if 1 

dzuj(dx) + o(l) = — I — / logxc<j(dx) H — / /(x) u;(dx) + o(l) 



/o Jo « "Jo VJo 

where we also used 

, c ^ c m (i-(i-^) m ) _ 1 o _ Ca 



/ — ' m! 

m=l 

and the well-known formula 

/•^ 1 _ e -y 

/ dy = I(x) + \ogx + 7 .□ 

Jo V 



Now recalling 

logn T 1 Z" 1 Z/ 

Eif n = -2_ + 1 + - logxcu (dx) + + o(l 
/i V vJo 2/i 2 

and comparing the expectations 

_ logn z/ l/' 1 T /x /, s /x ^ ^ logn i/ . , 
EL n = -2- + — - - / / x dx + o 1 , Ei? n = -2- + — + o 1 

we not only confirm 'by computation' the inequality EL n < Ei?„ (= U[0, logn]) but also come 
to the conclusion that the number of component intervals of [0, logn] \1Z which contain no Ej 's, 
j < n, remains bounded as n — > oo. This conclusion is in good accord with the general point 
taken in [§] , ^U] that the composition C n is a proper combinatorial analogue of the regenerative 

set iz.n 

Acknowledgement. I would like to thank Andrey Levin for help with (fTK|) and further 
Euler-summation formulas. 
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